智能论文笔记

Coupling streaming AI and HPC ensembles to achieve 100-1000x faster biomolecular simulations

Alexander Brace , Igor Yakushin , Heng Ma , Anda Trifan , Todd Munson , Ian Foster , Arvind Ramanathan , Hyungro Lee , Matteo Turilli , Shantenu Jha

分类：机器学习

2021-04-10

基于机器学习（ML）的转向可以通过在线选择更科学意义的计算来提高基于合奏的模拟的性能。我们提出了DeepDrivemd，这是ML驱动的科学模拟转向的框架，我们用来通过在大型平行计算机上的有效耦合ML和HPC来实现分子动力学（MD）性能的稳定性提高。我们讨论了DeepDrivemd的设计，并描述了其性能。我们证明，与其他方法相对于其他方法，DeepDrivemd可以在100-1000倍加速度之间达到100-1000倍的加速度，这是通过执行的模拟时间量来衡量的，同时覆盖了模拟过程中采样的状态所量化的相同构象景观。实验是在最多1020个节点的领导级平台上进行的。该结果将DeepDrivemd作为ML驱动的HPC模拟方案的高性能框架建立，该场景支持不同的MD仿真和ML后端，并通过改善当前计算能力来改善长度和时间尺度来实现新的科学见解。

translated by 谷歌翻译

Generalizable Natural Language Processing Framework for Migraine Reporting from Social Media

Yuting Guo , Swati Rajwal , Sahithi Lakamana , Chia-Chun Chiang , Paul C. Menell , Adnan H. Shahid , Yi-Chieh Chen , Nikita Chhabra , Wan-Ju Chao , Chieh-Ju Chao

分类：自然语言处理

2022-12-23

Migraine is a high-prevalence and disabling neurological disorder. However, information migraine management in real-world settings could be limited to traditional health information sources. In this paper, we (i) verify that there is substantial migraine-related chatter available on social media (Twitter and Reddit), self-reported by migraine sufferers; (ii) develop a platform-independent text classification system for automatically detecting self-reported migraine-related posts, and (iii) conduct analyses of the self-reported posts to assess the utility of social media for studying this problem. We manually annotated 5750 Twitter posts and 302 Reddit posts. Our system achieved an F1 score of 0.90 on Twitter and 0.93 on Reddit. Analysis of information posted by our 'migraine cohort' revealed the presence of a plethora of relevant information about migraine therapies and patient sentiments associated with them. Our study forms the foundation for conducting an in-depth analysis of migraine-related information using social media data.

translated by 谷歌翻译

Scale-Invariant Specifications for \\Human-Swarm Systems

Joel Meyer , Ahalya Prabhakar , Allison Pinosky , Ian Abraham , Annalisa Taylor , Millicent Schlafly , Katarina Popovic , Giovani Diniz , Brendan Teich , Borislava Simidchieva

分类：机器人

2022-12-06

We present a method for controlling a swarm using its spectral decomposition -- that is, by describing the set of trajectories of a swarm in terms of a spatial distribution throughout the operational domain -- guaranteeing scale invariance with respect to the number of agents both for computation and for the operator tasked with controlling the swarm. We use ergodic control, decentralized across the network, for implementation. In the DARPA OFFSET program field setting, we test this interface design for the operator using the STOMP interface -- the same interface used by Raytheon BBN throughout the duration of the OFFSET program. In these tests, we demonstrate that our approach is scale-invariant -- the user specification does not depend on the number of agents; it is persistent -- the specification remains active until the user specifies a new command; and it is real-time -- the user can interact with and interrupt the swarm at any time. Moreover, we show that the spectral/ergodic specification of swarm behavior degrades gracefully as the number of agents goes down, enabling the operator to maintain the same approach as agents become disabled or are added to the network. We demonstrate the scale-invariance and dynamic response of our system in a field relevant simulator on a variety of tactical scenarios with up to 50 agents. We also demonstrate the dynamic response of our system in the field with a smaller team of agents. Lastly, we make the code for our system available.

translated by 谷歌翻译

Bayesian Semiparametric Model for Sequential Treatment Decisions with Informative Timing

Arman Oganisian , Kelly D. Getz , Todd A. Alonzo , Richard Aplenc , Jason A. Roy

分类：机器学习 | (统计)机器学习

2022-11-29

We develop a Bayesian semi-parametric model for the estimating the impact of dynamic treatment rules on survival among patients diagnosed with pediatric acute myeloid leukemia (AML). The data consist of a subset of patients enrolled in the phase III AAML1031 clinical trial in which patients move through a sequence of four treatment courses. At each course, they undergo treatment that may or may not include anthracyclines (ACT). While ACT is known to be effective at treating AML, it is also cardiotoxic and can lead to early death for some patients. Our task is to estimate the potential survival probability under hypothetical dynamic ACT treatment strategies, but there are several impediments. First, since ACT was not randomized in the trial, its effect on survival is confounded over time. Second, subjects initiate the next course depending on when they recover from the previous course, making timing potentially informative of subsequent treatment and survival. Third, patients may die or drop out before ever completing the full treatment sequence. We develop a generative Bayesian semi-parametric model based on Gamma Process priors to address these complexities. At each treatment course, the model captures subjects' transition to subsequent treatment or death in continuous time under a given rule. A g-computation procedure is used to compute a posterior over potential survival probability that is adjusted for time-varying confounding. Using this approach, we conduct posterior inference for the efficacy of hypothetical treatment rules that dynamically modify ACT based on evolving cardiac function.

translated by 谷歌翻译

ToDD: Topological Compound Fingerprinting in Computer-Aided Drug Discovery

Andac Demir , Baris Coskunuzer , Ignacio Segovia-Dominguez , Yuzhou Chen , Yulia Gel , Bulent Kiziltan

分类：机器学习 | 人工智能

2022-11-07

In computer-aided drug discovery (CADD), virtual screening (VS) is used for identifying the drug candidates that are most likely to bind to a molecular target in a large library of compounds. Most VS methods to date have focused on using canonical compound representations (e.g., SMILES strings, Morgan fingerprints) or generating alternative fingerprints of the compounds by training progressively more complex variational autoencoders (VAEs) and graph neural networks (GNNs). Although VAEs and GNNs led to significant improvements in VS performance, these methods suffer from reduced performance when scaling to large virtual compound datasets. The performance of these methods has shown only incremental improvements in the past few years. To address this problem, we developed a novel method using multiparameter persistence (MP) homology that produces topological fingerprints of the compounds as multidimensional vectors. Our primary contribution is framing the VS process as a new topology-based graph ranking problem by partitioning a compound into chemical substructures informed by the periodic properties of its atoms and extracting their persistent homology features at multiple resolution levels. We show that the margin loss fine-tuning of pretrained Triplet networks attains highly competitive results in differentiating between compounds in the embedding space and ranking their likelihood of becoming effective drug candidates. We further establish theoretical guarantees for the stability properties of our proposed MP signatures, and demonstrate that our models, enhanced by the MP signatures, outperform state-of-the-art methods on benchmark datasets by a wide and highly statistically significant margin (e.g., 93% gain for Cleves-Jain and 54% gain for DUD-E Diverse dataset).

translated by 谷歌翻译

Scale-Invariant Fast Functional Registration

Muchen Sun , Allison Pinosky , Ian Abraham , Todd Murphey

分类：计算机视觉 | 机器人

2022-09-26

功能配准算法表示点云为函数（例如，空间占用场），避免了常规最小二乘Quares注册算法中不可靠的对应估计。但是，现有的功能注册算法在计算上很昂贵。此外，在基于CAD模型的对象本地化等任务中，必须使用未知量表的注册能力，但是功能注册中没有这种支持。在这项工作中，我们提出了一种比例不变的线性时间复杂性功能配准算法。我们通过使用正顺序基函数在功能之间的L2距离之间有效地近似实现线性时间复杂性。正统基函数的使用导致与最小二乘配准兼容的公式。受益于最小二乘的公式，我们使用翻译反转不变测量的理论来解除尺度估计，从而实现规模不变的注册。我们在标准的3D注册基准上评估了所提出的算法，称为FLS（功能最小二乘），显示FLS的数量级比最先进的功能配准算法快，而无需损害准确性和鲁棒性。 FLS还胜过基于最小二乘的最小二乘注册算法，其精度和鲁棒性具有已知和未知量表。最后，我们证明将FLS应用于具有不同密度和部分重叠的寄存点云，同一类别中不同对象的点云以及带有嘈杂RGB-D测量值的真实世界对象的点云。

translated by 谷歌翻译

UNav: An Infrastructure-Independent Vision-Based Navigation System for People with Blindness and Low vision

Anbang Yang , Mahya Beheshti , Todd E Hudson , Rajesh Vedanthan , Wachara Riewpaiboon , Pattanasak Mongkolwat , Chen Feng , John-Ross Rizzo

分类：计算机视觉

2022-09-22

现在，基于视觉的本地化方法为来自机器人技术到辅助技术的无数用例提供了新出现的导航管道。与基于传感器的解决方案相比，基于视觉的定位不需要预安装的传感器基础架构，这是昂贵，耗时和/或通常不可行的。本文中，我们为特定用例提出了一个基于视觉的本地化管道：针对失明和低视力的最终用户的导航支持。给定最终用户在移动应用程序上拍摄的查询图像，该管道利用视觉位置识别（VPR）算法在目标空间的参考图像数据库中找到相似的图像。这些相似图像的地理位置用于采用加权平均方法来估计最终用户的位置和透视N点（PNP）算法的下游任务中，以估计最终用户的方向。此外，该系统实现了Dijkstra的算法，以根据包括Trip Origin和目的地的可通航地图计算最短路径。用于本地化和导航的层压映射是使用定制的图形用户界面构建的，该图形用户界面投影了3D重建的稀疏映射，从一系列图像构建到相应的先验2D楼平面图。用于地图构造的顺序图像可以在预映射步骤中收集，也可以通过公共数据库/公民科学清除。端到端系统可以使用带有自定义移动应用程序的相机安装在任何可互联网的设备上。出于评估目的，在复杂的医院环境中测试了映射和定位。评估结果表明，我们的系统可以以少于1米的平均误差来实现本地化，而无需了解摄像机的固有参数，例如焦距。

translated by 谷歌翻译

Exploring Code Style Transfer with Neural Networks

Karl Munson , Anish Savla , Chih-Kai Ting , Serenity Wade , Kiran Kate , Kavitha Srinivas

分类：自然语言处理

2022-09-13

样式是自然语言文本的重要组成部分，反映了文本语调的变化，同时保持基础信息相同。即使编程语言具有严格的语法规则，它们也具有风格。代码可以使用相同的功能编写，但使用不同的语言功能。但是，编程样式很难量化，因此，作为这项工作的一部分，我们定义了专门针对Python的样式属性。为了构建样式的定义，我们利用层次聚类来捕获样式定义，而无需指定转换。除了定义样式外，我们还探索了预训练的代码语言模型的功能，以捕获有关代码样式的信息。为此，我们微调了预训练的代码语言模型，并在代码样式转移任务中评估了其性能。

translated by 谷歌翻译

On the Factory Floor: ML Engineering for Industrial-Scale Ads Recommendation Models

Rohan Anil , Sandra Gadanho , Da Huang , Nijith Jacob , Zhuoshu Li , Dong Lin , Todd Phillips , Cristina Pop , Kevin Regan , Gil I. Shamir

分类：机器学习

2022-09-12

对于工业规模的广告系统，对广告点击率（CTR）的预测是一个核心问题。广告点击构成了一类重要的用户参与，通常用作广告对用户有用的主要信号。此外，在每次点击收费的广告系统中，单击费用期望值直接输入价值估计。因此，对于大多数互联网广告公司而言，CTR模型开发是一项重大投资。此类问题的工程需要许多适合在线学习的机器学习（ML）技术，这些技术远远超出了传统的准确性改进，尤其是有关效率，可重复性，校准，信用归因。我们介绍了Google搜索广告CTR模型中部署的实用技术的案例研究。本文提供了一项行业案例研究，该研究强调了当前的ML研究的重要领域，并说明了如何评估有影响力的新ML方法并在大型工业环境中有用。

translated by 谷歌翻译

HealthyGAN: Learning from Unannotated Medical Images to Detect Anomalies Associated with Human Disease

Md Mahfuzur Rahman Siddiquee , Jay Shah , Teresa Wu , Catherine Chong , Todd Schwedt , Baoxin Li

分类：计算机视觉

2022-09-05

从MRI和X射线等医学图像中自动检测的自动异常可显着减少人类在疾病诊断方面的努力。由于建模异常的复杂性以及领域专家（例如放射科医生）的高度手动注释成本，因此当前医学成像文献中的典型技术仅着重于从健康对象中得出诊断模型，假设该模型将检测到图像，来自患者作为异常值。但是，在许多实际情况下，与健康和患病患者混合在一起的未注释的数据集很丰富。因此，本文提出了一个研究问题，即如何通过（1）（1）（1）（2）（2）文献中使用的一组健康图像来改善无监督的异常检测。为了回答这个问题，我们提出了一种新型的单向图像到图像翻译方法的Healthygan，该方法学会了将图像从混合数据集中转换为仅健康图像。作为一方面的Healthygan，Healthygan放宽了现有未配对的图像到图像翻译方法的循环一致性的要求，这对于混合的未注释数据是无法实现的。一旦学习了翻译，我们通过减去其翻译输出来为任何给定图像生成差异图。差异图中显着响应的区域对应于潜在异常（如果有）。我们的Healthygan在两个公开可用的数据集上优于传统的最先进方法：Covid-19和NIH Chestx-Ray14，以及从Mayo Clinic收集的一个机构数据集。该实施可在https://github.com/mahfuzmohammad/healthygan上公开获得。

translated by 谷歌翻译